Phase 1b: MSE-optimal bandwidth selector (CCF 2018 DPI) by igerber · Pull Request #335 · igerber/diff-diff

igerber · 2026-04-19T17:20:42Z

Summary

Add diff_diff.mse_optimal_bandwidth + BandwidthResult dataclass — the Phase 1b MSE-optimal bandwidth selector for the HAD (Heterogeneous Adoption Design) nonparametric estimator.
Port nprobust 0.5.0 (SHA 36e4e53) lpbwselect(bwselect="mse-dpi") into diff_diff/_nprobust_port.py (~500 lines): kernel_W, qrXXinv, lprobust_res, lprobust_vce, lprobust_bw, lpbwselect_mse_dpi. Three-stage DPI with four nested lprobust.bw calls at orders q+1, q+2, q, p.
Parity harness: benchmarks/R/generate_nprobust_golden.R produces benchmarks/data/nprobust_mse_dpi_golden.json with every stage bandwidth + per-stage (V, B1, B2, R). Python port matches R at 0.0000% relative error on all five stage bandwidths across three deterministic DGPs (uniform, Beta(2,2), half-normal).
149 tests pass (67 new for Phase 1b + 82 existing local-linear regression). Public API scope is intentionally restricted to the HAD configuration (vce="nn", nnmatch=3, p=1, deriv=0, interior=False); the broader port surface is available via _nprobust_port for Phase 1c.
Phase 1b checkbox ticked in docs/methodology/REGISTRY.md with explicit notes about weights= being unsupported (no nprobust parity anchor) and the public API scope restriction.

Methodology references (required if estimator / math changes)

Method name(s): Calonico-Cattaneo-Farrell (2018) MSE-DPI direct-plug-in bandwidth selector for local-polynomial boundary estimation; used by the HeterogeneousAdoptionDiD (HAD) estimator of de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026, arXiv:2405.04465v6) Design 1'.
Paper / source link(s):
- Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2018). On the effect of bias estimation on coverage accuracy in nonparametric inference. JASA 113(522), 767-779.
- Reference implementation: nprobust 0.5.0 CRAN package (GitHub SHA 36e4e532d2f7d23d4dc6e162575cca79e0927cda, github.com/nppackages/nprobust); the R source at npfunctions.R:187-288 (lprobust.bw) and :498-607 (lpbwselect.mse.dpi) is cited inline in the Python port.
- de Chaisemartin et al. (2026) — docstring cross-references; registry entry at docs/methodology/REGISTRY.md#heterogeneousadoptiondid.
Any intentional deviations from the source (and why):
- weights= raises NotImplementedError in the public wrapper. nprobust::lpbwselect has no weight argument, so there is no parity anchor. Deferred to Phase 2 (survey-design adaptation). Documented in REGISTRY and docstring.
- Public API scope restricted to HAD Phase 1b (vce="nn", nnmatch=3, p=1, deriv=0, interior=False are hard-coded in mse_optimal_bandwidth). The underlying port supports a broader surface but those paths are not parity-tested. Callers needing the broader surface go through diff_diff._nprobust_port.lpbwselect_mse_dpi. Documented in REGISTRY and docstring; pinned by a behavioral test.

Validation

Tests added/updated:
- tests/test_nprobust_port.py (NEW, 14 tests) — unit tests for every ported helper (kernel, Cholesky-inverse, NN residuals, clustered/unclustered meat, single-stage lprobust_bw parity).
- tests/test_bandwidth_selector.py (NEW, 78 tests via parametrize) — 5-bandwidth parity on 3 DGPs at 1% + per-stage (V, B1, B2) parity + validation (shapes, finiteness, unknown kernel, weights, bwcheck range) + API-scope pin test + kernel-dispatch + boundary + downstream-integration + rate-scaling (G^{-1/5} MC).
- tests/test_local_linear.py — unchanged; still green as regression anchor for Phase 1a kernels.
Backtest / simulation / notebook evidence (if applicable): N/A — Phase 1b ships kernel machinery only; simulation / replication harness (DGPs 1-3 coverage) is Phase 4.

Security / privacy

Confirm no secrets/PII in this PR: Yes.

🤖 Generated with Claude Code

Port nprobust 0.5.0 (SHA 36e4e53) lpbwselect(bwselect="mse-dpi") in-house as diff_diff.mse_optimal_bandwidth + BandwidthResult, backed by private diff_diff._nprobust_port module (kernel_W, lprobust_bw, lpbwselect_mse_dpi). Three-stage DPI with four lprobust.bw calls at orders q+1, q+2, q, p. Parity verified at 0.0000% on all five stage bandwidths (c_bw, bw_mp2, bw_mp3, b_mse, h_mse) across three deterministic DGPs (uniform, Beta(2,2), half-normal) via benchmarks/R/generate_nprobust_golden.R. weights= raises NotImplementedError (no nprobust parity anchor); deferred to Phase 2. vce='nn' is the only verified variant; hc0/hc1/hc2/hc3 paths are implemented but untested. 144 tests pass (32 new bandwidth-selector/port + 55 existing local_linear regression + others). Plan at ~/.claude/plans/vectorized-beaming-feather.md was approved with 1% parity target; actual port achieves essentially exact parity. Phase 1b checkbox ticked in docs/methodology/REGISTRY.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 fixes from local AI review: - Add bwcheck validation in lpbwselect_mse_dpi: reject bwcheck < 1 and bwcheck > N with clear ValueError instead of IndexError inside the sorted-abs indexing. - Export BandwidthResult and mse_optimal_bandwidth from diff_diff/__init__.py::__all__ so the public API propagates through star imports and introspection. - Fix contradictory docstring in lpbwselect_mse_dpi: the HAD case (p=1, deriv=0) has even=False, so the R source dispatches to the closed-form C$bw branch, NOT to optimize(). Docstring now matches the actual control flow. - 4 new tests in test_bandwidth_selector covering bwcheck > N, bwcheck=0, bwcheck=-1, and bwcheck=None on a 5-obs sample. 148 tests pass (up from 144). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AI review flagged that the public wrapper silently hard-codes vce="nn", nnmatch=3, p=1, deriv=0, interior=False while the underlying port supports a broader surface. Two options: expose the extra parameters or document the restriction. Chose to document since Phase 1b plan explicitly scopes those paths as untested / deferred. - mse_optimal_bandwidth docstring now has a "Public API scope" section describing the HAD Phase 1b restriction and pointing users who need the broader surface at diff_diff._nprobust_port.lpbwselect_mse_dpi. - REGISTRY.md Phase 1b entry gains a "Note (public API scope restriction)" documenting the same contract. - New test test_public_wrapper_fixes_vce_nn_nnmatch_3 pins the restriction: the wrapper's output must equal the explicit port call with vce="nn", nnmatch=3, p=1, deriv=0, interior=False. Any future scope expansion now fails the test and forces a REGISTRY/docstring update in lockstep. 150 tests pass (up from 148). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-19T17:26:42Z

Overall Assessment
⚠️ Needs changes — I found two unmitigated P1 issues in the new bandwidth-selector path: one methodology applicability check is missing, and one new edge-case path still falls through to opaque linear-algebra failures.

Executive Summary

P1: mse_optimal_bandwidth() is documented and wired as a boundary-only Phase 1b API, but it never verifies that the supplied boundary is actually the lower support edge. Interior-point or upper-boundary calls can therefore use the wrong selector branch or return a bandwidth that is incompatible with the one-sided fitter.
P1: the DPI port never checks that each stage has enough in-window observations / full-rank design before calling Cholesky inversion. Small samples or mispositioned boundaries can still fail with opaque LinAlgErrors.
P3: the tiny-sample test explicitly tolerates arbitrary failures instead of pinning the intended user-facing error contract, and the module docstring still says Phase 1b “will add” the selector.
The documented deviations in the registry (weights= unsupported; broader _nprobust_port surface not parity-tested outside HAD Phase 1b) are properly disclosed and are not defects.

Methodology

Severity P1. Impact: the exported wrapper is explicitly scoped to the HAD boundary case, but it accepts any numeric boundary and always forwards interior=False. The methodology registry says the nonparametric HAD path is for the support infimum d̲ (0 in Design 1', min D_2 in the continuous-near-d̲ variant), and the local fitter itself is one-sided to the right of boundary. As written, an interior point or upper-boundary input silently runs the boundary selector anyway. lpbwselect itself exposes interior as a distinct mode, and the R mse-dpi branch switches behavior based on that flag. Refs: diff_diff/local_linear.py, diff_diff/local_linear.py, diff_diff/local_linear.py, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md. (rdrr.io)
Concrete fix: in mse_optimal_bandwidth(), enforce the Phase 1b lower-boundary contract explicitly (boundary at d.min() within tolerance, and no observations below it), or reject non-boundary / upper-boundary / two-sided-support inputs with a clear ValueError until a documented interior API exists.

Code Quality

Severity P1. Impact: each DPI stage builds its active-window design matrix and sends it directly into qrXXinv() with no explicit support-size or rank check. That means tiny samples, bwcheck=None, or boundary/support misconfiguration still leak opaque Cholesky failures instead of a clear validation error. The test suite currently acknowledges that gap by only asserting “not IndexError” on a tiny-sample call. Refs: diff_diff/_nprobust_port.py, diff_diff/_nprobust_port.py, diff_diff/_nprobust_port.py, diff_diff/_nprobust_port.py, tests/test_bandwidth_selector.py. bwcheck only enlarges the selected bandwidth when it is provided; setting it to None removes that safeguard entirely. (rdrr.io)
Concrete fix: before each qrXXinv() call, validate that the active window has at least the required number of observations and full column rank (n_V >= o+1, n_B1 >= o_B+1, n_B2 >= o_B+2, plus rank checks). Raise a targeted ValueError naming the failing stage and suggesting larger G, a valid lower boundary, or a non-None bwcheck.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No separate deferred-work item applies here. These two P1 issues are not tracked in TODO.md, so they remain unmitigated.

Security

No findings.

Documentation/Tests

Severity P3. Impact: diff_diff/local_linear.py still says Phase 1b “will add” the selector, and tests/test_bandwidth_selector.py intentionally accepts any non-IndexError failure on tiny samples. Neither is a blocker by itself, but both weaken the regression contract around the new public API.
Concrete fix: update the module docstring to reflect that Phase 1b shipped, and once the P1 validation is added, replace the catch-all tiny-sample test with explicit pytest.raises(ValueError, match=...) checks. Also add rejection tests for interior and upper-boundary inputs.

Path to Approval

Add a boundary-applicability check in mse_optimal_bandwidth() that rejects interior-point, upper-boundary, or two-sided-support inputs for the exported Phase 1b API, and add regression tests for boundary=0.5 on U(0,1) data and boundary=d.max().
Add explicit per-stage support/rank guards before qrXXinv() in lprobust_bw() (or a wrapper-level preflight), and update the tiny-sample test to require a clear ValueError on bwcheck=None instead of permitting arbitrary failures.

P1 #1 (methodology): mse_optimal_bandwidth now rejects boundary > d.min() with a clear ValueError. The Phase 1b wrapper is scoped to the HAD lower-boundary case (Design 1' with d_0 = 0 or Design 1 continuous-near- d_lower with d_0 = min D_2). Interior or upper-boundary inputs would silently run the boundary selector with a symmetric kernel and return a bandwidth incompatible with the one-sided fitter. The port remains available for interior / broader surface via _nprobust_port.lpbwselect_mse_dpi. P1 #2 (code quality): lprobust_bw validates in-window observation counts at each of the three local-poly fits before calling qrXXinv: - variance: n_V >= o+1 - B1: n_B1 >= o_B+1 - B2: n_B2 >= o_B+2 Each guard raises a targeted ValueError naming the failing stage, the bandwidth, and suggested remediation. Previously these failed with opaque LinAlgError from Cholesky on under-determined designs. P3 (doc): local_linear.py module docstring updated to say Phase 1b "ships" instead of "will add"; tiny-sample test now asserts the new ValueError contract instead of accepting any non-IndexError failure. New behavioral tests: - test_interior_boundary_rejected: boundary=0.5 on U(0,1) rejected - test_upper_boundary_rejected: boundary=d.max() rejected - test_boundary_equal_to_min_d_accepted: boundary=min(d) accepted (Design 1 continuous-near-d_lower path) - test_boundary_below_min_d_accepted: boundary=0 with d.min()>0 accepted (Design 1' path) - test_bwcheck_none_on_tiny_sample_raises_valueerror: upgraded from "catch anything non-IndexError" to pytest.raises(ValueError, match="lprobust_bw"). 153 tests pass (up from 149). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T17:34:59Z

/ai-review

github-actions · 2026-04-19T17:50:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5799d427d7eb54eec17104df07d854bd7cb4df65

Overall Assessment

⚠️ Needs changes — the prior interior/upper-boundary P1 is fixed, but I still see two unmitigated P1 issues in the new Phase 1b bandwidth-selector path.

Executive Summary

The previous boundary-applicability finding is resolved: the wrapper now rejects interior and upper-boundary inputs, and the tests pin that behavior in diff_diff/local_linear.py:L608-L634 and tests/test_bandwidth_selector.py:L267-L304.
The previous tiny-sample failure is only partially resolved: support-count guards were added, but full-rank validation is still missing, so duplicate-support windows can still leak raw Cholesky failures out of qrXXinv().
Severity P1: mse_optimal_bandwidth() still accepts lower-boundary mass-point designs even though the Methodology Registry assigns d̲ > 0 mass-point cases to the 2SLS/sample-average path, not the nonparametric CCF bandwidth selector.
The documented deviations (weights= unsupported; broader _nprobust_port surface not parity-tested) are properly disclosed in the registry and are not defects.
Severity P3: the test suite claims per-stage (V, B1, B2, R) parity, but it only asserts V/B1/B2; the nonzero-R stages are not actually pinned.

Methodology

Affected methods: Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selection for local-polynomial boundary estimation, as used in the HAD nonparametric Equation 7/8 path.

Severity P1. Impact: the new public wrapper enforces “lower boundary” but not “continuous near the lower boundary.” The registry explicitly separates the d̲ > 0 mass-point case into a different estimator family: sample averages / 2SLS with instrument 1{D_2 > d̲}, not the nonparametric CCF local-polynomial path. As written, mse_optimal_bandwidth() accepts boundary = min(d) and dispatches straight into lpbwselect_mse_dpi with no check for bunching at d̲, so a lower-boundary mass-point design can silently receive a nonparametric bandwidth for the wrong methodology. See diff_diff/local_linear.py:L517-L537, diff_diff/local_linear.py:L608-L634, docs/methodology/REGISTRY.md:L2172-L2179, docs/methodology/REGISTRY.md:L2228-L2230, and docs/methodology/REGISTRY.md:L2246-L2249.
Concrete fix: before calling lpbwselect_mse_dpi, detect nontrivial mass at min(d) for the d̲ > 0 case (the registry already plans a >2% modal-min rule) and raise a clear NotImplementedError / ValueError directing users to the Design 1 mass-point 2SLS path until Phase 2 ships.
Severity P3. Impact: none. The two intentional deviations called out in the PR body are properly documented in the registry: weights= is unsupported, and the broader private _nprobust_port surface is not parity-tested. I am not treating those as defects. See docs/methodology/REGISTRY.md:L2305-L2305.
Concrete fix: none.

Code Quality

Severity P1. Impact: the earlier “support/rank” finding is only half fixed. lprobust_bw() now checks n_V >= o+1, n_B1 >= o_B+1, and n_B2 >= o_B+2, but it still does not verify that the weighted design matrices are full rank before calling qrXXinv(), and qrXXinv() still blindly Cholesky-factorizes X'X. Any window with enough rows but too few unique support points will still escape as a raw LinAlgError instead of the promised clear validation error. The tiny-sample test added here only exercises the count failure, not the rank-deficient duplicate-support case. See diff_diff/_nprobust_port.py:L111-L131, diff_diff/_nprobust_port.py:L395-L408, diff_diff/_nprobust_port.py:L463-L475, diff_diff/_nprobust_port.py:L520-L531, tests/test_bandwidth_selector.py:L256-L265, and tests/test_nprobust_port.py:L91-L108.
Concrete fix: add an explicit full-rank check before each qrXXinv() call (or catch np.linalg.LinAlgError inside qrXXinv() / lprobust_bw() and re-raise ValueError) and pin it with a duplicate-support test.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The two P1 issues above are not tracked in TODO.md, so they remain unmitigated blockers rather than accepted deferred work.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the tests and PR summary overstate the parity coverage. TestStageDiagnosticsParity says it checks per-stage (V, B1, B2, R), but it only asserts V, B1, and B2; the only R assertion is the trivial scale=0 case for stage_d1, so regressions in stage_d2_R, stage_b_R, or stage_h_R could slip through. See tests/test_bandwidth_selector.py:L76-L117 and tests/test_nprobust_port.py:L222-L243.
Concrete fix: add golden assertions for R on all four stages and add a direct rank-deficient duplicate-support test so the new error contract is actually covered.

Path to Approval

Add a design-applicability guard in mse_optimal_bandwidth() that rejects d̲ > 0 lower-boundary mass-point inputs and points users to the Design 1 mass-point 2SLS path.
Add full-rank validation around every qrXXinv() call in lprobust_bw() (or catch/rethrow LinAlgError) so duplicate-support windows fail with a clear ValueError.
Extend the tests to assert stage R parity from the golden JSON and to cover the rank-deficient duplicate-support case explicitly.

P1 #1 (methodology): mse_optimal_bandwidth now rejects Design 1 mass-point designs. When boundary > 0 and the modal fraction at d.min() exceeds the REGISTRY-specified 2% threshold, raise NotImplementedError pointing to the 2SLS sample-average estimator per de Chaisemartin et al. (2026) Section 3.2.4. Design 1' with untreated units at d=0 (boundary=0) is still accepted per Garrett et al. (2020) application precedent. P1 #2 (code quality): qrXXinv now catches np.linalg.LinAlgError from Cholesky and re-raises as ValueError with a targeted message naming the failing dimension and suggesting remediation. Duplicate-support windows or other rank-deficient designs now fail with a clear error instead of leaking LinAlgError out of the port. P3 (tests): Added TestStageDiagnosticsParity::test_R_parity covering all four stages. Previously only V/B1/B2 were pinned; R (BWreg) was only trivially checked for stage_d1 (scale=0 -> R=0). Now stage_b and stage_h R values are explicitly parity-tested at 1% against R nprobust. New behavioral tests: - test_mass_point_design_rejected: 10% mass at 0.1 -> NotImplementedError - test_continuous_near_d_lower_accepted: uniform(0.1, 1.0) passes - test_untreated_at_zero_accepted: 15% at d=0 with boundary=0 passes - test_rank_deficient_design_raises_valueerror: rank-1 X -> ValueError - R parity on all four stages across 3 DGPs (12 new parametrized cases) 169 tests pass (up from 153). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T17:53:40Z

/ai-review

github-actions · 2026-04-19T17:59:15Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 03532fc0b22ec2e07687bed7393482b689c46c72

Overall Assessment

⚠️ Needs changes — one unmitigated P1 methodology issue remains in the new public bandwidth-selector path.

Executive Summary

The prior interior/upper-boundary issue is fixed: the public wrapper now rejects interior and upper-boundary evaluation points.
The prior raw Cholesky failure issue is largely fixed: qrXXinv() now re-raises rank-deficient X'X failures as a clear ValueError, and lprobust_bw() has stage-specific minimum-count guards.
The prior parity-coverage gap for stage R is fixed: the end-to-end parity suite now asserts R on all four DPI stages.
One P1 remains: the new mass-point guard only runs when boundary > 0, so the default boundary=0.0 path can still silently send d_min > 0 mass-point designs down the wrong nonparametric Design 1′ branch.
weights= unsupported and the public wrapper’s restricted HAD-only scope are documented deviations in the registry and are not defects.
I could not run the test suite in this environment because pytest and numpy are unavailable, so this is a static re-review.

Methodology

Severity P1. Impact: the prior mass-point applicability finding is only partially resolved. The new guard in mse_optimal_bandwidth() only checks for bunching when boundary > _boundary_tol, so it catches boundary=d_min>0 inputs but not the default boundary=0.0 call path. As a result, a dataset with d_min > 0 and >2% mass at min(d) can still silently receive a Design 1′ CCF bandwidth instead of the Design 1 mass-point 2SLS redirection required by the registry’s split between Design 1′, Design 1 mass-point, and Design 1 continuous-near-d̲. See mse_optimal_bandwidth() in diff_diff/local_linear.py:L608, especially the positive-boundary-only mass-point branch at diff_diff/local_linear.py:L646, the default boundary=0.0 API at diff_diff/local_linear.py:L535, the acceptance test for boundary=0 below min(d) at tests/test_bandwidth_selector.py:L315, and the methodology split at docs/methodology/REGISTRY.md:L2172, docs/methodology/REGISTRY.md:L2228, and docs/methodology/REGISTRY.md:L2246. Concrete fix: make the default boundary=0.0 path classify or reject d_min>0 inputs instead of silently treating them as Design 1′. The cleanest fix is to apply the registry’s planned rule here: if min(d) < 0.01 * median(d), allow Design 1′; else if modal-min fraction >2%, raise the mass-point 2SLS NotImplementedError; else require boundary=d.min() for the continuous-near-d̲ path.
Severity P3. Impact: none. The Phase 1b deviations are properly documented: weights= is unsupported, and the exported wrapper is intentionally limited to p=1, deriv=0, interior=False, vce="nn", nnmatch=3, while broader private-path settings are not parity-tested. See docs/methodology/REGISTRY.md:L2305 and the wrapper docstring at diff_diff/local_linear.py:L517. Concrete fix: none.

Code Quality

No findings. The earlier raw LinAlgError path is now converted into a clear ValueError in qrXXinv(), and the stage-specific minimum-count guards are present in lprobust_bw(). See diff_diff/_nprobust_port.py:L111 and diff_diff/_nprobust_port.py:L368.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The blocking methodology issue above is not tracked in TODO.md:L51, so it remains unmitigated.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the selector-level regression for rank-deficient duplicate-support windows is still not actually pinned. The new test only proves that qrXXinv() raises on a hand-made singular matrix, not that a duplicate-support dataset flowing through lprobust_bw() / lpbwselect_mse_dpi() / mse_optimal_bandwidth() surfaces the intended ValueError rather than a different failure. See the catch in diff_diff/_nprobust_port.py:L137, the stage calls in diff_diff/_nprobust_port.py:L714, and the current test at tests/test_bandwidth_selector.py:L362. Concrete fix: add an integration test where the sample has enough rows but too few distinct in-window d values for one pilot stage, and assert that lpbwselect_mse_dpi() and the public wrapper raise ValueError.

Execution note: I could not run pytest here because the review environment lacks both pytest and numpy.

Path to Approval

Update mse_optimal_bandwidth() so the default boundary=0.0 path does not silently accept d_min>0 datasets. Either implement the registry’s Design 1′ / mass-point / continuous-near-d̲ split in this wrapper, or hard-fail ambiguous boundary=0 inputs with d_min>0 and force the caller to choose the correct design explicitly.
Add a regression test where boundary=0.0, d.min()>0, and the modal-min fraction exceeds 2%; it should raise the mass-point NotImplementedError rather than return a bandwidth.
Add a companion regression test where boundary=0.0, d.min()>0, and there is no mass point; it should either apply the documented Design 1′ heuristic or raise a clear error instructing the caller to pass boundary=d.min().
Add a selector-level duplicate-support test that drives a singular in-window design through lpbwselect_mse_dpi() / mse_optimal_bandwidth() and asserts the new ValueError contract.

… test P1 (methodology): The boundary=0 default path now classifies inputs per the REGISTRY-planned design="auto" rule. Previously the mass-point guard only fired for boundary>0, so a dataset with d.min()>0 and mass at d.min() would silently pass through the default-boundary path as Design 1'. The new rule: - boundary=0 with min(d) < 0.01 * median(d): Design 1' accepted (support infimum effectively at 0). - boundary=0 with min(d) >= 0.01 * median(d): * modal fraction at min(d) > 2%: mass-point design -> raise NotImplementedError pointing to the 2SLS / Phase 2 path. * otherwise: ambiguous -> raise ValueError asking the caller to pass boundary=d.min() for the Design 1 continuous- near-d_lower path. - boundary>0 path unchanged (mass-point check already in place). Removed the stale test_boundary_below_min_d_accepted: it used U(0.01, 1) data which doesn't satisfy Design 1' under the stricter rule. Replaced with three targeted tests: - test_boundary_zero_design_1_prime_accepted: U(0, 1) passes. - test_boundary_zero_with_positive_d_min_rejected: U(0.5, 1) raises "Ambiguous design". - test_boundary_zero_with_d_min_mass_point_rejected: mass at 0.1 with boundary=0 raises mass-point NotImplementedError. P3 (tests): Added test_full_stack_rank_deficient_raises_valueerror driving a 3-distinct-value dataset through both lpbwselect_mse_dpi and the public wrapper; both must raise ValueError (or NotImplementedError), never LinAlgError or IndexError. 172 tests pass (up from 169). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T18:02:25Z

/ai-review

github-actions · 2026-04-19T18:10:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a29a8fa9ffce2f659993a33600c9984c3e484565

Overall Assessment
⚠️ Needs changes — two unmitigated P1 methodology issues remain in the new public bandwidth-selector wrapper.

Executive Summary

The previous blocker is fixed: the default boundary=0 path no longer silently treats obvious d_min>0 datasets as Design 1′.
The new wrapper now hard-applies the Phase 2 design="auto" heuristic on the Phase 1b public API, which can reject valid Design 1′ finite samples and is not documented in the new Phase 1b registry note.
The wrapper still accepts off-support lower boundaries (boundary < d.min()), which are not any documented HAD estimand and can silently target the wrong limit.
The nprobust port itself looks faithful, and the documented deviations (weights= unsupported, public API restricted to the HAD Phase 1b surface) are properly recorded.
The earlier duplicate-support test gap is only partially closed: the private selector is covered, but the public-wrapper regression still does not pin the intended error contract.
Static re-review only: pytest is unavailable in this environment.

Methodology

Severity P1. Impact: mse_optimal_bandwidth() now hard-fails the default boundary=0 path whenever d.min() > 0.01 * median(d) and there is no mass point, via the new heuristic at diff_diff/local_linear.py:673 and diff_diff/local_linear.py:693. That rule is the still-unshipped Phase 2 design="auto" heuristic in the registry at docs/methodology/REGISTRY.md:2309, not part of the documented Phase 1b public contract at docs/methodology/REGISTRY.md:2305. Methodologically, the nonparametric Design 1′ path is still the paper’s intended path for support-at-zero designs, including thin-boundary-density simulations like the committed Beta(2,2) DGP class described at benchmarks/R/generate_nprobust_golden.R:195 and discussed at docs/methodology/REGISTRY.md:2231. Hard-rejecting those samples in Phase 1b is an undocumented default-behavior change and can produce production errors on valid inputs. Concrete fix: do not make the 1%/2% heuristic a hard gate on the Phase 1b public API unless you also add an explicit design=/override contract and document it in REGISTRY.md; otherwise, boundary=0 needs a supported explicit Design 1′ path.
Severity P1. Impact: boundary validation is still too loose. The wrapper only rejects boundary > d.min() at diff_diff/local_linear.py:625, so any lower off-support boundary still passes and is forwarded directly to lpbwselect_mse_dpi(..., eval_point=boundary) at diff_diff/local_linear.py:708. Downstream, local_linear_fit will also use that same off-support boundary with the one-sided kernel at diff_diff/local_linear.py:347. For HAD, the documented nonparametric estimands are at 0 (Design 1′) or at d̲ = min(D_2) (Design 1 continuous-near-d̲) per docs/methodology/REGISTRY.md:2179 and docs/methodology/REGISTRY.md:2240. Accepting boundary < d.min() is therefore an undocumented off-support extrapolation path that can silently return the wrong bandwidth target. Concrete fix: require boundary to equal 0 or d.min() within tolerance, and reject all other off-support values with a targeted error.
Severity P3. Impact: none. The Phase 1b deviations for unsupported weights= and the restricted HAD-only public surface are documented at docs/methodology/REGISTRY.md:2305 and in the wrapper docstring starting at diff_diff/local_linear.py:517. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings beyond the documented Phase 1b deferrals already recorded in the registry.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the duplicate-support regression is still only partially pinned at the public-wrapper level. The private path is now exercised, but test_full_stack_rank_deficient_raises_valueerror() in tests/test_bandwidth_selector.py:397 allows either ValueError or NotImplementedError, so the new design guard can short-circuit before the wrapper actually exercises the qrXXinv/lprobust_bw failure contract. Concrete fix: add a public-wrapper regression using boundary=float(d.min()) on a continuous-near-d̲ duplicate-support sample and assert the specific ValueError contract.
Execution note: pytest is not installed in this review environment, so this re-review is static.

Path to Approval

Decide and document the public contract for boundary=0 on d.min()>0 samples. If you want auto-classification, expose an explicit override (for example design=) and record the heuristic in REGISTRY.md; otherwise, remove the hard Phase 2 heuristic from the Phase 1b wrapper.
Tighten boundary validation so the public wrapper accepts only boundary≈0 or boundary≈d.min(), and rejects any other lower off-support value.
Add a regression test that an off-support lower boundary is rejected, e.g. d ~ U(1,2) with boundary=0.5.
Add a regression test that covers the chosen boundary=0 contract on a thin-boundary-density Design 1′ sample, so valid support-at-zero inputs are not accidentally blocked.
Add a wrapper-level duplicate-support regression with boundary=float(d.min()) and assert the specific ValueError surface from the selector path.

Reviewer correctly flagged that the 1%-of-median rule is a Phase 2 design="auto" heuristic, not Phase 1b. Backed off that over-reach. P1 #1: Removed the min(d)/median(d) < 0.01 check. The mass-point guard now applies uniformly (whenever d.min() > 0 and modal fraction at d.min() > 2%) and does not gate on boundary. This still catches the original concern (silently routing mass-point data through the nonparametric branch) without rejecting valid Design 1' samples like Beta(2,2) where d.min() is strictly positive but small. P1 #2: Tightened boundary validation. The wrapper now accepts only boundary ~ 0 (Design 1') or boundary ~ d.min() (Design 1 continuous- near-d_lower) within float tolerance. Off-support values -- including the previously-allowed "boundary < d.min()" path -- are rejected with a targeted error message. P3: Added a public-wrapper duplicate-support regression that drives a rank-deficient X'X through the full selector stack (boundary = d.min(), unique minimum, only 4 distinct d values) and asserts a specific "qrXXinv" ValueError, not LinAlgError. Test updates: - Removed test_boundary_zero_with_positive_d_min_rejected: the case it modeled is now accepted (no mass point). - Added test_boundary_zero_thin_boundary_density_accepted: Beta(2,2) Design 1' with vanishing boundary density now passes. - Added test_off_support_boundary_rejected: boundary=0.5 on U(1,2). - Added test_negative_boundary_rejected: boundary<0 rejected. - Updated test_nonzero_boundary: uses boundary=float(d.min()), not boundary=1.0 (which is off the realized support of U(1,2)). 175 tests pass (up from 172). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T18:25:19Z

/ai-review

github-actions · 2026-04-19T18:30:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8d357b6804a7b4641a1d080743a01f2f074679cd

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the current diff.

Executive Summary

The two prior methodology blockers look resolved in the current version: the public wrapper now has an explicit boundary≈0 / boundary≈d.min() contract, and unsupported lower off-support boundaries are rejected instead of being silently forwarded into the selector. diff_diff/local_linear.py#L608 diff_diff/local_linear.py#L622
The wrapper also now has explicit failure surfaces for mass-point designs and rank-deficient pilot windows, with wrapper-level regressions for the duplicate-support case that was missing in the prior review. diff_diff/local_linear.py#L634 diff_diff/_nprobust_port.py#L137 tests/test_bandwidth_selector.py#L426
The new private port is carefully source-mapped to nprobust internals, and the Phase 1b registry entry documents the intentional public-surface restrictions. diff_diff/_nprobust_port.py#L13 docs/methodology/REGISTRY.md#L2310
The documented deviations (weights= unsupported; HAD-only wrapper surface) are properly recorded, so they are P3 informational rather than defects. docs/methodology/REGISTRY.md#L2310 diff_diff/local_linear.py#L517
Static review only: I could not execute the new parity suite in this environment because the review sandbox is missing numpy/pytest.

Methodology

No unmitigated P0/P1 findings.
Severity P3 (informational). Impact: none. The intentional Phase 1b deviations are documented: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, those are documented deviations, not defects. Concrete fix: none. docs/methodology/REGISTRY.md#L2310 diff_diff/local_linear.py#L517

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech debt found in the changed files. The remaining Phase 1b scope limitations are documented in the methodology registry rather than being silently deferred. docs/methodology/REGISTRY.md#L2310

Security

No findings.

Documentation/Tests

Severity P3. Impact: the public API docs understate the actual error contract of mse_optimal_bandwidth(). The implementation now raises ValueError for unsupported boundary choices and NotImplementedError for detected mass-point designs, but the docstring Raises section still only mentions shape/non-finite/kernel/weights failures. Concrete fix: extend the Raises section in diff_diff/local_linear.py#L564 to cover the boundary rejection at diff_diff/local_linear.py#L622 and the mass-point redirect at diff_diff/local_linear.py#L647; optionally mirror that contract in the Phase 1b registry note. docs/methodology/REGISTRY.md#L2310
Static-review note: the new parity and behavioral coverage is substantially better than in the prior rerun, including wrapper-level duplicate-support regression coverage. tests/test_bandwidth_selector.py#L426

CI AI review P3: the docstring only listed shape/finite/kernel/weights raises but the implementation now also raises on boundary off-support, mass-point, bwcheck out-of-range, and per-stage rank/count failures. Document the full contract. 175 tests pass (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T18:38:52Z

/ai-review

github-actions · 2026-04-19T18:55:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7277ec86e8b1dd4122db382aaa2e848bb99d8538

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the current diff.

Executive Summary

The prior methodology blockers look resolved in this rereview: the public wrapper now hard-fails on unsupported boundary choices and mass-point designs instead of silently forwarding them into the selector. diff_diff/local_linear.py:628 diff_diff/local_linear.py:653 tests/test_bandwidth_selector.py:286 tests/test_bandwidth_selector.py:379
The new _nprobust_port implementation is source-mapped stage-by-stage to nprobust, and the intentional Phase 1b public-surface restrictions are documented in the registry, so I did not treat those deviations as defects. diff_diff/_nprobust_port.py:13 docs/methodology/REGISTRY.md:2310
I did not find any new silent-correctness or methodology mismatches in the changed selector math, weighting, or variance code paths.
P3 only: the public boundary docs are now looser than the actual API contract; the implementation/tests require boundary ≈ d.min() for the nonzero-boundary case, not an arbitrary theoretical d_lower. diff_diff/local_linear.py:535 diff_diff/local_linear.py:628 tests/test_bandwidth_selector.py:481
Static review only: I could not execute the new parity suite here because this environment is missing numpy and pytest.

Methodology

Severity P3 (informational). Impact: none. The intentional Phase 1b deviations are documented: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, those are documented deviations rather than defects. Concrete fix: none. docs/methodology/REGISTRY.md:2310 diff_diff/local_linear.py:517

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech debt found in the changed files.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the boundary parameter docs no longer match the actual wrapper contract. The docstring still says callers should pass d_lower for continuous-near-d_lower designs, but the implementation and tests now require the sample minimum d.min() for that path; users following the current docs can hit an avoidable ValueError. Concrete fix: rewrite the boundary parameter docs (and nearby Notes text) to say the Phase 1b wrapper expects the sample estimate min(d) for the nonzero-boundary case. diff_diff/local_linear.py:535 diff_diff/local_linear.py:628 tests/test_bandwidth_selector.py:481
Severity P3. Impact: the public kernel surface is broader than the end-to-end parity anchor. The R golden generator and parity suite only cover kernel="epa"; triangular and uniform are only smoke-tested for dispatch, so two supported kernels lack the same R-backed regression anchor as the default path. Concrete fix: extend the golden generator and parity tests to at least one tri and one uni scenario. benchmarks/R/generate_nprobust_golden.R:191 tests/test_bandwidth_selector.py:455

CI AI review P3: the boundary parameter docs said callers could pass d_lower (the theoretical support lower bound), but the implementation requires boundary = float(d.min()) (the sample minimum). Users following the old docs on e.g. U(1, 2) data with boundary=1.0 would hit an avoidable ValueError. Docstring now explicitly says use the sample minimum. Deferred (P3): extending R golden parity to triangular and uniform kernels. All three kernels go through the same lprobust_bw code path, so epa parity transitively covers dispatch; tri/uni parity is a nice-to-have regression anchor but not a correctness gap. 175 tests pass (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T18:58:57Z

/ai-review

github-actions · 2026-04-19T19:05:19Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 044baedd98f55aa192a2b0ce829dd4f7f69c4b02

Overall Assessment

⚠️ Needs changes — one unmitigated P1 methodology issue.

Executive Summary

The affected method is the Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selector, plus the new HAD-specific public wrapper around it.
The core _nprobust_port translation is well source-mapped, and the explicit Phase 1b deviations (weights= unsupported; public wrapper restricted to p=1, deriv=0, interior=False, vce="nn", nnmatch=3) are documented in docs/methodology/REGISTRY.md:L2310, so I did not count those as defects.
Prior rereview P3 on the boundary docs looks resolved: the public docstring now says the nonzero-boundary path expects d.min().
P1: mse_optimal_bandwidth() never enforces the HAD support restriction D_{g,2} >= 0. Negative-dose samples can still pass through both accepted boundary branches and drive the selector on a non-HAD problem with no warning. diff_diff/local_linear.py:L625-L677, docs/methodology/REGISTRY.md:L2135-L2145
P3: the R-backed parity harness is still limited to kernel="epa" and eval_point=0; triangular, uniform, and the boundary=d.min() path are smoke-tested only. benchmarks/R/generate_nprobust_golden.R:L191-L199, tests/test_bandwidth_selector.py:L455-L491
Static review only: I could not execute the new suite here because this environment is missing numpy.

Methodology

Affected methods: diff_diff._nprobust_port.lpbwselect_mse_dpi and the HAD public wrapper diff_diff.local_linear.mse_optimal_bandwidth.

Severity P1. Impact: the public wrapper is documented as HAD-only, but it never checks the source-material requirement that period-2 doses satisfy D_{g,2} >= 0. If any d < 0 slips in, boundary=0 is accepted via _at_zero, and the symmetric nprobust kernel code can calibrate h_mse on an interior two-sided problem while the downstream fitter is explicitly one-sided on [boundary, boundary + h]; boundary=float(d.min()) also passes even though a negative d̲ is outside the documented HAD design. That is an unguarded assumption violation and can yield a plausible bandwidth with no warning. Concrete fix: add a front-door nonnegativity check in mse_optimal_bandwidth() before the _at_zero / _at_d_min routing, and add regression tests for negative-dose input under both boundary branches. diff_diff/local_linear.py:L66-L68, diff_diff/local_linear.py:L280-L297, diff_diff/local_linear.py:L625-L677, diff_diff/_nprobust_port.py:L78-L103, docs/methodology/REGISTRY.md:L2135-L2145, docs/methodology/papers/dechaisemartin-2026-review.md:L18-L29
Severity P3 (informational). Impact: none. The two intentional deviations called out by the PR are documented in the registry: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, I did not count those as defects. Concrete fix: none. docs/methodology/REGISTRY.md:L2310, diff_diff/local_linear.py:L517-L527, diff_diff/local_linear.py:L547-L550

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new blocker-level tech debt found in the changed files.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the committed golden/parity harness still anchors only the default kernel="epa" / eval_point=0 path. The additional supported public branches kernel in {"triangular","uniform"} and boundary=float(d.min()) are only smoke-tested, so regressions there would not be caught by the R-backed reference suite. Concrete fix: extend benchmarks/R/generate_nprobust_golden.R with at least one tri, one uni, and one shifted-boundary scenario, regenerate benchmarks/data/nprobust_mse_dpi_golden.json, and assert those values in tests/test_bandwidth_selector.py. benchmarks/R/generate_nprobust_golden.R:L191-L199, tests/test_bandwidth_selector.py:L47-L73, tests/test_bandwidth_selector.py:L455-L491
Severity P3. Impact: _nprobust_port.py still says weighted data can be handled by the public wrapper, but the new public wrapper explicitly raises NotImplementedError on weights=. Concrete fix: align the private-module docstring with the public API and the registry note. diff_diff/_nprobust_port.py:L34-L36, diff_diff/local_linear.py:L547-L550, diff_diff/local_linear.py:L599-L604
Static review only: I could not run pytest here because the environment lacks numpy.

Path to Approval

In diff_diff/local_linear.py, reject negative-dose input in mse_optimal_bandwidth() before the boundary/design routing so the HAD-only wrapper cannot run on D_{g,2} < 0 data.
Add regression tests that assert negative d raises under both boundary=0.0 and boundary=float(d.min()).

CI AI review P1: mse_optimal_bandwidth did not enforce the HAD support requirement D_{g,2} >= 0. Negative-dose samples could silently pass through both boundary branches: boundary=0 (accepted via _at_zero even with d_min<0) and boundary=float(d.min()) (accepts any lower edge). The symmetric nprobust kernel would happily calibrate a two-sided interior bandwidth while the downstream one-sided fitter runs on [boundary, boundary+h] -- silent assumption violation. Front-door check added: np.any(d < 0) raises ValueError with a message citing the paper's support assumption. Two new regression tests: - test_negative_dose_rejected_boundary_zero: d ~ U(-0.5, 0.5) with boundary=0 raises. - test_negative_dose_rejected_boundary_at_d_min: d ~ U(-1, -0.1) with boundary=d.min() raises. Deferred (P3 same as last round): tri/uni/shifted-boundary golden parity extension. All three kernels share lprobust_bw, so epa parity transitively covers kernel dispatch. 177 tests pass (up from 175). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:06:54Z

/ai-review

github-actions · 2026-04-19T19:11:52Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 196f91040a07bc19e7b3e8b07bb4bf47d0a919f1

Overall Assessment

⚠️ Needs changes — two unmitigated P1 findings remain.

Executive Summary

Affected methods: the CCF (2018) MSE-DPI selector port in diff_diff/_nprobust_port.py:599 and the HAD-scoped public wrapper in diff_diff/local_linear.py:495.
The prior rereview P1 on negative-dose input looks resolved: the wrapper now rejects D_{g,2} < 0 up front and has regression tests for both boundary branches. diff_diff/local_linear.py:625, tests/test_bandwidth_selector.py:379
P1 [Newly identified]: mse_optimal_bandwidth() still accepts boundary=0 when d.min() > 0 and the sample is continuous near d̲, even though the registry/paper split Design 1' (d̲=0) from Design 1 continuous-near-d̲ (boundary=d̲). That can silently calibrate a bandwidth at an off-support point. diff_diff/local_linear.py:644, docs/methodology/REGISTRY.md:2141, docs/methodology/REGISTRY.md:2235, tests/test_bandwidth_selector.py:335
P1: the new wrapper lacks an explicit empty-input guard. d.size == 0 passes current validation and then dies at d.min() / d.max() with an opaque NumPy reduction error, which is exactly the empty-result-set edge case the review checklist calls out. diff_diff/local_linear.py:614, diff_diff/local_linear.py:654
P3: the R-backed parity harness is still limited to kernel="epa" and eval_point=0; triangular, uniform, and boundary=d.min() remain smoke-tested only. benchmarks/R/generate_nprobust_golden.R:191, tests/test_bandwidth_selector.py:476, tests/test_bandwidth_selector.py:498
Static review only: I could not execute the new tests here because this environment lacks numpy and pytest.

Methodology

Severity P1 [Newly identified]. Impact: the public HAD wrapper treats boundary=0 as valid independently of whether the realized lower support is actually zero. That conflicts with the registry/paper requirement that Design 1' needs d̲=0, while Design 1 continuous-near-d̲ must replace 0 by d̲=min(D_{g,2}). On a sample with d.min()>0 but observations close enough to zero, the code can feed eval_point=0 into the DPI selector and return a plausible bandwidth for the wrong estimand with no warning. Concrete fix: in mse_optimal_bandwidth(), if boundary≈0, also require d.min()≈0; otherwise reject and direct the caller to boundary=float(d.min()) for the continuous-near-d̲ case, or to the future 2SLS path if the lower-bound mass-point rule fires. Update the existing boundary-zero tests accordingly. diff_diff/local_linear.py:644, docs/methodology/REGISTRY.md:2141, docs/methodology/REGISTRY.md:2235, docs/methodology/papers/dechaisemartin-2026-review.md:24, docs/methodology/papers/dechaisemartin-2026-review.md:67, tests/test_bandwidth_selector.py:335
Severity P3. Impact: none. The two explicit Phase 1b deviations called out in the PR are documented in REGISTRY.md: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD p=1, deriv=0, interior=False, vce="nn", nnmatch=3 surface. Under the review rubric, those are informational, not defects. Concrete fix: none. docs/methodology/REGISTRY.md:2310, diff_diff/local_linear.py:517, diff_diff/local_linear.py:599

Code Quality

Severity P1. Impact: the new wrapper does not handle empty inputs explicitly. With d=np.array([]), y=np.array([]), shape and finiteness checks pass, then the code falls into d.min() / d.max() and throws a low-level reduction error instead of a targeted validation failure. That is a production edge-case miss under the project’s empty-result-set checklist. Concrete fix: add a front-door if d.size == 0: raise ValueError("d and y must be non-empty") before any reductions, and add a regression test for empty arrays. diff_diff/local_linear.py:614, diff_diff/local_linear.py:654

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the golden/parity harness still only anchors the default kernel="epa" / eval_point=0 path. The additional public branches kernel in {"triangular","uniform"} and boundary=float(d.min()) are only smoke-tested, so a future regression there would not be caught by the R-backed reference suite. Concrete fix: extend benchmarks/R/generate_nprobust_golden.R with at least one tri, one uni, and one shifted-boundary scenario, regenerate benchmarks/data/nprobust_mse_dpi_golden.json, and assert those cases in tests/test_bandwidth_selector.py. benchmarks/R/generate_nprobust_golden.R:191, tests/test_bandwidth_selector.py:476, tests/test_bandwidth_selector.py:498
Severity P3. Impact: _nprobust_port.py still says weighted data can be handled by the public wrapper, but the public wrapper explicitly raises NotImplementedError on weights=. That leaves the private-module docs out of sync with the actual API and with the registry note. Concrete fix: change the _nprobust_port module docstring to say the public wrapper does not support weights in Phase 1b. diff_diff/_nprobust_port.py:34, diff_diff/local_linear.py:547, docs/methodology/REGISTRY.md:2310
Static review only: this environment lacks numpy and pytest, so I could not run the new selector tests.

Path to Approval

In mse_optimal_bandwidth(), reject boundary≈0 whenever d.min() is not also approximately zero. For d.min()>0, direct callers either to boundary=float(d.min()) (continuous-near-d̲) or to the future mass-point 2SLS path. Add a regression test with continuous d ~ Uniform(0.01, 1.0) that must fail front-door at boundary=0.
Add an explicit non-empty input guard before d.min() / d.max(), and add a regression test on empty d/y asserting a targeted ValueError.

P1 #1: boundary=0 now enforces a Design 1' support plausibility heuristic: d.min() <= 5% * median(|d|). Samples with d.min() substantially positive (e.g. U(0.5, 1)) are rejected with ValueError directing the caller to boundary=float(d.min()). Threshold chosen at 5% (not REGISTRY's 1%) so the paper's thin-boundary-density DGPs (Beta(2,2), d.min/median ~ 3%) still pass. Reordered so the mass-point check (NotImplementedError, paper Section 3.2.4) fires before the support-check -- mass-point data should be redirected to 2SLS regardless of the boundary the caller picked. P1 #2: Empty-input front-door guard. d.size == 0 raises ValueError with a targeted "must be non-empty" message instead of leaking the NumPy reduction error from d.min(). P3 (docstring sync): _nprobust_port module docstring no longer says weighted data can be handled by the public wrapper -- the wrapper explicitly raises NotImplementedError. Docstring now matches the actual contract. P3 (deferred, same as last round): tri/uni/shifted-boundary golden parity extension. REGISTRY.md Phase 1b note expanded to document the full input contract (nonnegativity, boundary applicability, Design 1' support heuristic, mass-point redirection) so the public API surface is fully specified in the methodology registry. 178 tests pass (up from 177). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:17:11Z

/ai-review

github-actions · 2026-04-19T19:22:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a313e05b17d95d0d23a37d28577268a5d0b1cda4

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

Affected methods: the public HAD wrapper mse_optimal_bandwidth() and the newly added direct-port entry point lpbwselect_mse_dpi(). diff_diff/local_linear.py:L495-L585, diff_diff/_nprobust_port.py:L602-L700
The prior rereview P1s on boundary=0 handling and empty public inputs look addressed: the wrapper now rejects empty d/y up front and enforces a documented boundary contract around boundary ~ 0 vs boundary ~ d.min(). diff_diff/local_linear.py:L614-L734
Severity P1 [Newly identified]: the documented advanced-use entry point diff_diff._nprobust_port.lpbwselect_mse_dpi() still lacks front-door shape/emptiness checks. In the default vce="nn" branch it blindly reindexes y and cluster by x’s sort order, so longer arrays are silently truncated and misaligned instead of being rejected. diff_diff/_nprobust_port.py:L653-L683, diff_diff/local_linear.py:L517-L527, docs/methodology/REGISTRY.md:L2310
The intentional methodology deviations in this PR (weights= unsupported; public wrapper hard-coded to the HAD Phase 1b surface) are documented in the Methodology Registry, so they are P3 informational rather than defects. diff_diff/local_linear.py:L517-L527, diff_diff/local_linear.py:L599-L605, docs/methodology/REGISTRY.md:L2310
Severity P3: the R-backed parity harness still only covers kernel="epa" at eval_point=0; triangular, uniform, and boundary=float(d.min()) are smoke-tested only. benchmarks/R/generate_nprobust_golden.R:L180-L199, tests/test_bandwidth_selector.py:L480-L515
Static review only: this environment lacks numpy and pytest, so I could not execute the new selector tests.

Methodology

Severity P3. Impact: none. The remaining deviations are explicitly documented: weights= is intentionally unsupported, and the exported wrapper is intentionally restricted to the HAD p=1 / deriv=0 / interior=False / vce="nn" / nnmatch=3 surface. Under the review rubric, those are informational, not methodology defects. Concrete fix: none. diff_diff/local_linear.py:L517-L527, diff_diff/local_linear.py:L599-L605, docs/methodology/REGISTRY.md:L2310

Code Quality

Severity P1 [Newly identified]. Impact: lpbwselect_mse_dpi() accepts x, y, and cluster without validating dimensionality or matching lengths, then default vce="nn" logic sorts x and applies the same integer indexer to y and cluster. If y or cluster is longer than x, the extra entries are silently dropped and the bandwidth is computed on misaligned data with no warning. Because both the wrapper docstring and the registry explicitly direct advanced callers to use this port for the broader surface, this is a real API-contract bug, not just an internal-helper quirk. Concrete fix: at the top of lpbwselect_mse_dpi(), ravel() x/y, reject empty inputs, require x.shape == y.shape, require finite values, and require cluster is None or cluster.shape == x.shape before the argsort block; add regression tests for len(y) > len(x), len(cluster) > len(x), and empty direct-port inputs. diff_diff/_nprobust_port.py:L602-L683, diff_diff/local_linear.py:L517-L527, docs/methodology/REGISTRY.md:L2310, tests/test_bandwidth_selector.py:L208-L284

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the committed golden generator only emits kernel="epa" / eval_point=0 scenarios, while the non-default public branches (triangular, uniform, boundary=float(d.min())) are only behaviorally smoke-tested in Python. A regression in those branches would not be caught by the R-backed parity suite. Concrete fix: extend benchmarks/R/generate_nprobust_golden.R with at least one triangular case, one uniform case, and one shifted-boundary case; regenerate benchmarks/data/nprobust_mse_dpi_golden.json; and assert those cases in tests/test_bandwidth_selector.py. benchmarks/R/generate_nprobust_golden.R:L180-L199, tests/test_bandwidth_selector.py:L480-L515

Static review only: numpy and pytest are unavailable in this environment, so I could not execute tests/test_nprobust_port.py or tests/test_bandwidth_selector.py.

Path to Approval

Add front-door contract validation to diff_diff._nprobust_port.lpbwselect_mse_dpi() for 1-D, non-empty, finite, same-length x/y, plus same-length cluster when provided, before the argsort / reindex block.
Add regression tests that cover the current failure modes: y longer than x with default vce="nn", cluster longer than x with default vce="nn", and empty direct-port inputs; assert targeted ValueErrors instead of silent truncation or raw NumPy errors.

CI AI review P1: the port's lpbwselect_mse_dpi is advertised as the advanced-use entry point for callers outside the HAD Phase 1b wrapper surface. It accepted x, y, cluster without validating shapes, and the default vce="nn" branch reindexed y/cluster by argsort(x) -- so a longer y would be silently truncated to match x's length, producing a bandwidth on misaligned data with no warning. Added front-door validation at the top of lpbwselect_mse_dpi: - x and y ravel()ed and required to have the same shape - x must be non-empty - x, y, and eval_point must be finite - cluster (if supplied) must match x.shape Seven new tests in TestLpbwselectMseDpiValidation: - test_mismatched_shapes_raise - test_longer_y_silent_truncation_rejected (regression for the specific nn-reindex bug) - test_cluster_wrong_length_rejected - test_empty_direct_port_input_rejected - test_non_finite_x_rejected - test_non_finite_y_rejected - test_non_finite_eval_point_rejected 185 tests pass (up from 178). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:24:50Z

/ai-review

Prior commit moved shape / emptiness / finiteness checks into the port but left bwcheck validation above them. As a result, empty or non-finite inputs got "bwcheck exceeds sample size" errors instead of the targeted contract messages the tests expect. Reorder so input-shape validation runs first, then bwcheck, then kernel/vce. Drop duplicate N assignment. 185 tests pass (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:25:53Z

/ai-review

github-actions · 2026-04-19T19:33:44Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5de3634d36b0ceb2da917a82f320cd3fbd26faa9

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

Affected methods: the PR adds the Calonico-Cattaneo-Farrell MSE-DPI boundary-bandwidth selector, exposed via diff_diff/local_linear.py:L495 and implemented in diff_diff/_nprobust_port.py:L602, for the HAD setting documented in the Methodology Registry and de Chaisemartin et al. citeturn0search2turn2view0
The prior rerereview blocker is fixed: _nprobust_port.lpbwselect_mse_dpi() now rejects mismatched/empty x/y and wrong-length cluster before the argsort/reindex path, with regression coverage added. diff_diff/_nprobust_port.py:L640, tests/test_nprobust_port.py:L251
Severity P1 [Newly identified]: the advanced-use clustered path still has incomplete missing-data handling. cluster is only shape-checked, but clustered meat construction later groups by cluster == c; missing cluster IDs can therefore alter clustered stage-V calculations and the selected bandwidth instead of being dropped like nprobust or rejected up front. diff_diff/_nprobust_port.py:L658, diff_diff/_nprobust_port.py:L301. citeturn3search0
The intentional deviations called out in the PR are properly documented: weights= remains unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, those are P3 informational, not blockers. docs/methodology/REGISTRY.md:L2310, diff_diff/local_linear.py:L495. citeturn0search2
Static review only: pytest and numpy are unavailable in this environment, so I could not execute the new test suite.

Methodology

Severity P1 [Newly identified]. Impact: the new front-door validation in diff_diff/_nprobust_port.py:L658 enforces shape/emptiness/finiteness for x and y, but not missingness for cluster. Once a clustered call reaches diff_diff/_nprobust_port.py:L301, rows are assigned to blocks via cluster == c; missing cluster IDs can therefore either drop out of every block or form a spurious cluster, silently changing the clustered meat and hence the DPI bandwidth. The reference nprobust::lpbwselect() does complete-case filtering on x, y, and cluster before dispatch, so this is an undocumented source-behavior mismatch on the documented advanced-use path. Concrete fix: in lpbwselect_mse_dpi(), either mirror nprobust’s complete-case filtering for x/y/cluster before sorting, or reject missing cluster IDs up front with a targeted ValueError; then add a regression test for clustered calls with missing cluster labels. diff_diff/_nprobust_port.py:L602, diff_diff/_nprobust_port.py:L658, diff_diff/_nprobust_port.py:L301. citeturn3search0
Severity P3. Impact: none. The wrapper’s HAD-only scope restrictions and weights= rejection are explicitly documented in the Registry and docstrings, so they are not methodology defects under the review rubric. Concrete fix: none. docs/methodology/REGISTRY.md:L2310, diff_diff/local_linear.py:L495. citeturn0search2

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new validation tests cover wrong-length cluster, but they do not pin the missing-cluster case that still breaks source parity on the clustered advanced-use path, so the blocker above would regress unnoticed. Concrete fix: add direct-port tests with clustered calls where cluster contains np.nan and None, and assert either R-style complete-case dropping or the documented targeted error. tests/test_nprobust_port.py:L251
Static review only: I could not run tests/test_nprobust_port.py or tests/test_bandwidth_selector.py because pytest/numpy are not installed in this environment.

Path to Approval

In diff_diff/_nprobust_port.py:L602, handle missing cluster IDs before any sort/reindex or clustered lprobust_vce() call. Preferred: mirror nprobust complete-case filtering. Acceptable fallback: reject missing cluster IDs explicitly.
Add regression coverage for clustered advanced-use calls with missing cluster labels in tests/test_nprobust_port.py:L251.
If the project intentionally chooses stricter erroring instead of nprobust-style row dropping, document that deviation in docs/methodology/REGISTRY.md and the _nprobust_port.py docstring so it is no longer an undocumented behavior mismatch.

CI AI review P1: the advanced-use lpbwselect_mse_dpi accepted cluster arrays with NaN / None silently. Downstream, clustered meat construction groups rows via cluster == c; NaN compares unequal to itself, so missing-cluster rows would drop from every block and the meat matrix would change silently, altering the selected bandwidth. nprobust::lpbwselect does complete-case filtering on (x, y, cluster) before dispatch. This port deliberately chooses the stricter "reject" policy consistent with the library's no-silent-failures convention: missing cluster IDs raise ValueError up front. The "reject vs. filter" deviation from nprobust is documented in the module docstring Deviations list. Handles both NaN in float arrays and None in object-dtype arrays. Two new regression tests: - test_missing_cluster_id_rejected_nan - test_missing_cluster_id_rejected_none 187 tests pass (up from 185). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:35:32Z

/ai-review

github-actions · 2026-04-19T19:43:44Z

🔁 AI review rerun (requested by @igerber)

Head SHA: aca17caccad0f3260d3720798ac9028497aad592

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

Affected method: the PR adds the Calonico-Cattaneo-Farrell MSE-DPI bandwidth selector via diff_diff.mse_optimal_bandwidth and the new private port in diff_diff/local_linear.py:L495 and diff_diff/_nprobust_port.py:L610.
Re-review status: the prior missing-cluster-ID finding appears fixed. The port now rejects wrong-length and missing cluster inputs up front in diff_diff/_nprobust_port.py:L666, with regression coverage in tests/test_nprobust_port.py:L320.
Severity P1 [Newly identified]: the clustered lprobust_vce() path is not a faithful port of the cited nprobust source. The Python code returns w * M in diff_diff/_nprobust_port.py:L351, while the cited R source computes the same finite-sample factor w but returns M; lprobust.bw uses that result directly in the stage-V calculations. This changes clustered pilot/final bandwidths for direct _nprobust_port callers and future clustered integrations. (rdrr.io)
The documented Phase 1b deviations are properly recorded and are not defects under the rubric: weights= rejection, HAD-only public wrapper scope, and the wrapper’s input-contract heuristics are all explicitly noted in docs/methodology/REGISTRY.md:L2310.
Static review only: I could not run the new tests in this environment because numpy/pytest are not installed.

Methodology

Severity P1 [Newly identified]. Impact: the clustered advanced-use path in lprobust_vce() does not match the cited nprobust implementation. In the PR, the clustered branch computes w and returns w * M at diff_diff/_nprobust_port.py:L351. In the cited nprobust source, lprobust.vce computes the same w, accumulates M, and returns M; lprobust.bw then plugs that return value directly into V.V and V.B. Because diff_diff/_nprobust_port.py:L471 and diff_diff/_nprobust_port.py:L537 feed lprobust_vce() straight into the DPI stages, clustered calls to _nprobust_port.lpbwselect_mse_dpi(..., cluster=...) will silently produce different stage variances and bandwidths from the claimed reference implementation. The public Phase 1b wrapper is unaffected today because it hard-codes cluster=None at diff_diff/local_linear.py:L742. Concrete fix: either make the clustered branch match the cited source exactly, or explicitly document the clustered finite-sample correction as a **Note (deviation from R):** in the registry and module docstring and narrow the “faithful/exact port” wording accordingly. (rdrr.io)
Severity P3. Impact: none. The wrapper’s weights= rejection, HAD-only exported scope, and boundary/input-contract restrictions are explicitly documented in the Phase 1b registry entry, so they are informational rather than defects under this rubric. Concrete fix: none. See docs/methodology/REGISTRY.md:L2310 and diff_diff/local_linear.py:L517.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not find a TODO.md entry that already tracks the clustered lprobust_vce mismatch.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new parity harness only pins vce="nn" behavior, and the new clustered coverage only checks input validation. That means the clustered lprobust_vce formula drift above would not be caught by this PR’s added tests. Concrete fix: add at least one clustered regression/parity test that exercises cluster != None through lprobust_vce() or lpbwselect_mse_dpi(). Relevant anchors: benchmarks/R/generate_nprobust_golden.R:L36, tests/test_nprobust_port.py:L170, tests/test_nprobust_port.py:L320.
Severity P3. Impact: one test comment is stale: tests/test_bandwidth_selector.py:L337 says the Design 1' rule is 1% of median(|d|), but the implementation and registry use 5% in diff_diff/local_linear.py:L711 and docs/methodology/REGISTRY.md:L2310. Concrete fix: update the test comment to 5% so the documented rule, code, and tests stay aligned.
Static review only: I could not execute pytest locally because numpy is unavailable in this environment.

Path to Approval

Resolve the clustered lprobust_vce() mismatch in diff_diff/_nprobust_port.py:L351: either match the cited nprobust source exactly, or explicitly record the clustered finite-sample scaling as a **Note (deviation from R):** in docs/methodology/REGISTRY.md:L2310 and the module docstring.
Add a clustered regression/parity test that exercises cluster != None through the DPI path so this variance/bandwidth behavior is pinned, rather than relying only on vce="nn" golden tests and cluster-missing validation.
Update the stale 1% comment in tests/test_bandwidth_selector.py:L337 to the implemented 5% rule.

CI AI review P1: my port of lprobust.vce multiplied the accumulated cluster meat by w = ((n-1)/(n-k)) * (g/(g-1)). But the R source (npfunctions.R:165-185) computes w but does NOT apply it to the returned M -- w is dead code in the R function. My w factor would silently inflate clustered stage variances and drift the DPI bandwidths away from nprobust. The public wrapper is unaffected (it hard-codes cluster=None), but the documented advanced-use entry point lpbwselect_mse_dpi supports cluster= and would have produced non-faithful results. Fix: drop the w multiplier so lprobust_vce returns the raw sum-of-outer-products of cluster scores, matching R. A comment documents that w is computed-but-unused in R so future maintainers don't reintroduce the multiplier thinking it was a port omission. Three new tests in TestLprobustVceClustered: - test_clustered_meat_matches_unscaled_sum: bit-exact vs manual sum without the w factor - test_clustered_is_symmetric: meat is symmetric - test_clustered_end_to_end_through_lprobust_bw: smoke-tests the clustered DPI path end-to-end P3 (stale comment): test_boundary_zero_with_data_far_from_zero_rejected docstring and comment referenced "1% of median" but the implemented rule is 5%. Aligned with the code. P3 (same as last 4 rounds): tri/uni kernel golden parity extension. 190 tests pass (up from 187). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:45:56Z

/ai-review

github-actions · 2026-04-19T19:51:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: be2a6438b98b8b34d841c2b5a0994734a4c37081

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Affected methods: this PR adds the Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selector through the public HAD wrapper diff_diff/local_linear.py:495 and the private nprobust port diff_diff/_nprobust_port.py:610.
Re-review status: the prior P1 on clustered lprobust_vce() is resolved. The clustered branch now returns the unscaled cluster-summed meat M, matching the cited R source, in diff_diff/_nprobust_port.py:338, with regression coverage added in tests/test_nprobust_port.py:356.
The wrapper’s deliberate scope restrictions and input-contract heuristics are explicitly documented in the methodology registry, so under the rubric they are informational rather than defects docs/methodology/REGISTRY.md:2310.
Test coverage is materially stronger: the PR adds five-bandwidth parity, per-stage (V, B1, B2, R) parity, wrapper validation, and clustered advanced-use smoke/regression tests in tests/test_bandwidth_selector.py:47, tests/test_bandwidth_selector.py:167, and tests/test_nprobust_port.py:251.
Static review only: I could not execute the suite here because pytest and numpy are not installed.

Methodology

Severity P3. Impact: none. The public wrapper deliberately narrows raw nprobust behavior to the documented HAD Phase 1b surface: weights rejection, hard-coded vce="nn"/nnmatch=3, boundary restricted to 0 or d.min(), the 5% Design 1' plausibility heuristic, and mass-point redirection to the Phase 2 2SLS path. Those deviations are explicitly documented in docs/methodology/REGISTRY.md:2310 and are consistent with the registry’s Design 1'/Design 1 split in docs/methodology/REGISTRY.md:2141, docs/methodology/REGISTRY.md:2177, and docs/methodology/REGISTRY.md:2184. Concrete fix: none.
No other methodology findings. The previously reported clustered lprobust_vce mismatch is fixed in diff_diff/_nprobust_port.py:338 and pinned by tests/test_nprobust_port.py:356.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not identify any remaining issue in this PR that needed downgrade via the review-debt tracker in TODO.md:51.

Security

No findings.

Documentation/Tests

No findings. The new parity and regression coverage is appropriately targeted for this methodology change in tests/test_bandwidth_selector.py:47 and tests/test_nprobust_port.py:356. Static review only: pytest and numpy are unavailable in this environment, so I could not run the tests.

Previous phrasing "parity verified at 0.0000%" reads as "0% agreement" at a glance. What we actually have is 0.0000% relative error -- i.e., bit-parity within float64 precision. Clearer wording. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T19:57:39Z

/ai-review

github-actions · 2026-04-19T20:08:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 795b461f9b19fb3d097180e7aba395ef50821cd3

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review focus: the prior clustered lprobust_vce() issue appears resolved. The clustered branch in diff_diff/_nprobust_port.py#L338 now returns the unscaled cluster-summed meat, matching the nprobust source where the finite-sample factor w is computed but not applied to M. (rdrr.io)
Affected methods: this PR adds the Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selector in the private port diff_diff/_nprobust_port.py#L610 and the HAD-facing public wrapper diff_diff/local_linear.py#L495. The staged q+1, q+2, q, p call structure and the closed-form odd-(p-deriv) branch match the reference nprobust implementation and the paper’s Design 1′ / Design 1 split. (arxiv.org)
The wrapper’s extra guardrails (weights rejection, boundary restriction, mass-point redirection, Design 1′ plausibility heuristic) are explicitly documented in docs/methodology/REGISTRY.md#L2310, so under the review rubric they are informational deviations, not blockers.
Edge-case coverage is materially stronger. The PR adds targeted validation for empty/non-finite inputs, bwcheck range, boundary applicability, rank-deficient pilot windows, and clustered-meat regression anchors in tests/test_bandwidth_selector.py#L167 and tests/test_nprobust_port.py#L247.
Static review only: I could not execute the suite here because numpy is not installed in this environment.

Methodology

Severity P3. Impact: none. The public wrapper in diff_diff/local_linear.py#L495 deliberately narrows raw nprobust behavior to the documented HAD Phase 1b surface: weights unsupported, boundary limited to 0 or d.min(), mass-point cases redirected away from the nonparametric path, and an extra Design 1′ support heuristic. Those choices are explicitly recorded in docs/methodology/REGISTRY.md#L2310, so they are informational rather than defects under the rubric. The underlying method split itself is consistent with the paper’s Design 1′ / mass-point / continuous-near-d̲ cases and with nprobust::lpbwselect.mse.dpi. Concrete fix: none. (arxiv.org)
No other methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not identify any new deferrable item from this PR that needed to be added to TODO.md#L51.

Security

No findings.

Documentation/Tests

No findings. The added parity and regression coverage is appropriately targeted for this methodology change in tests/test_bandwidth_selector.py#L31 and tests/test_nprobust_port.py#L357. Static review only: I could not run the tests in this environment because numpy is unavailable.

CI's "Copy tests to isolated location" step runs tests against the installed package from /private/tmp/tests/, but does NOT copy benchmarks/ alongside. My parity tests hard-failed with FileNotFoundError when the golden JSON was not present at the expected relative path. Follow the established repo convention for golden-value fixtures: if not GOLDEN_PATH.exists(): pytest.skip("Golden values file not found; run: Rscript ...") Matches the pattern used by test_csdid_ported.py, test_chaisemartin_dhaultfoeuille_parity.py, test_survey_real_data.py, test_survey_estimator_validation.py, and test_linalg_hc2_bm.py. Parity tests still run as hard gates: - when invoked from the repo root (local /pre-merge-check, /ai-review-local, dev iteration) - when benchmarks/ is present alongside tests/ in CI jobs They skip in the isolated-install job where only tests/ is copied. Net effect matches the shipping convention for all other R-backed parity suites in this repo. 190 tests pass (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber and others added 3 commits April 19, 2026 13:19

igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026

igerber removed the ready-for-ci Triggers CI test workflows label Apr 19, 2026

igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026

igerber merged commit ed94e43 into main Apr 19, 2026
19 checks passed

igerber deleted the did-no-untreated branch April 19, 2026 21:45

igerber mentioned this pull request Apr 20, 2026

Release 3.2.0: BusinessReport preview, dCDH survey completion, silent-failures audit #342

Merged

Conversation

igerber commented Apr 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant